Overview

Dataset statistics

Number of variables12
Number of observations891
Missing cells2
Missing cells (%)< 0.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory275.9 KiB
Average record size in memory317.1 B

Variable types

Numeric6
Categorical6

Alerts

Pclass is highly correlated with FareHigh correlation
SibSp is highly correlated with FamilySizeHigh correlation
Parch is highly correlated with FamilySizeHigh correlation
Fare is highly correlated with PclassHigh correlation
FamilySize is highly correlated with SibSp and 1 other fieldsHigh correlation
Pclass is highly correlated with FareHigh correlation
SibSp is highly correlated with FamilySizeHigh correlation
Parch is highly correlated with FamilySizeHigh correlation
Fare is highly correlated with PclassHigh correlation
FamilySize is highly correlated with SibSp and 1 other fieldsHigh correlation
Pclass is highly correlated with FareHigh correlation
SibSp is highly correlated with FamilySizeHigh correlation
Parch is highly correlated with FamilySizeHigh correlation
Fare is highly correlated with PclassHigh correlation
FamilySize is highly correlated with SibSp and 1 other fieldsHigh correlation
Survived is highly correlated with Sex and 1 other fieldsHigh correlation
Sex is highly correlated with Survived and 1 other fieldsHigh correlation
Entitlement is highly correlated with Survived and 1 other fieldsHigh correlation
Survived is highly correlated with Sex and 1 other fieldsHigh correlation
Pclass is highly correlated with Fare and 1 other fieldsHigh correlation
Sex is highly correlated with Survived and 1 other fieldsHigh correlation
Age is highly correlated with EntitlementHigh correlation
SibSp is highly correlated with Parch and 2 other fieldsHigh correlation
Parch is highly correlated with SibSp and 1 other fieldsHigh correlation
Fare is highly correlated with PclassHigh correlation
Embarked is highly correlated with PclassHigh correlation
Entitlement is highly correlated with Survived and 2 other fieldsHigh correlation
FamilySize is highly correlated with SibSp and 2 other fieldsHigh correlation
FamilyCategory is highly correlated with SibSp and 1 other fieldsHigh correlation
PassengerId is uniformly distributed Uniform
PassengerId has unique values Unique
SibSp has 608 (68.2%) zeros Zeros
Parch has 678 (76.1%) zeros Zeros
Fare has 15 (1.7%) zeros Zeros

Reproduction

Analysis started2021-10-25 18:58:20.987170
Analysis finished2021-10-25 18:58:39.104673
Duration18.12 seconds
Software versionpandas-profiling v3.1.0
Download configurationconfig.json

Variables

PassengerId
Real number (ℝ≥0)

UNIFORM
UNIQUE

Distinct891
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean446
Minimum1
Maximum891
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size13.9 KiB
2021-10-25T15:58:39.303311image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile45.5
Q1223.5
median446
Q3668.5
95-th percentile846.5
Maximum891
Range890
Interquartile range (IQR)445

Descriptive statistics

Standard deviation257.353842
Coefficient of variation (CV)0.5770265516
Kurtosis-1.2
Mean446
Median Absolute Deviation (MAD)223
Skewness0
Sum397386
Variance66231
MonotonicityStrictly increasing
2021-10-25T15:58:39.782056image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
8911
 
0.1%
2931
 
0.1%
3041
 
0.1%
3031
 
0.1%
3021
 
0.1%
3011
 
0.1%
3001
 
0.1%
2991
 
0.1%
2981
 
0.1%
2971
 
0.1%
Other values (881)881
98.9%
ValueCountFrequency (%)
11
0.1%
21
0.1%
31
0.1%
41
0.1%
51
0.1%
61
0.1%
71
0.1%
81
0.1%
91
0.1%
101
0.1%
ValueCountFrequency (%)
8911
0.1%
8901
0.1%
8891
0.1%
8881
0.1%
8871
0.1%
8861
0.1%
8851
0.1%
8841
0.1%
8831
0.1%
8821
0.1%

Survived
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size57.4 KiB
0
549 
1
342 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row1
3rd row1
4th row1
5th row0

Common Values

ValueCountFrequency (%)
0549
61.6%
1342
38.4%

Length

2021-10-25T15:58:40.427220image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-10-25T15:58:40.703741image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
ValueCountFrequency (%)
0549
61.6%
1342
38.4%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Pclass
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size57.4 KiB
3
491 
1
216 
2
184 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row3
2nd row1
3rd row3
4th row1
5th row3

Common Values

ValueCountFrequency (%)
3491
55.1%
1216
24.2%
2184
 
20.7%

Length

2021-10-25T15:58:41.316492image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-10-25T15:58:41.672539image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
ValueCountFrequency (%)
3491
55.1%
1216
24.2%
2184
 
20.7%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Sex
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size60.7 KiB
male
577 
female
314 

Length

Max length6
Median length4
Mean length4.704826038
Min length4

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowmale
2nd rowfemale
3rd rowfemale
4th rowfemale
5th rowmale

Common Values

ValueCountFrequency (%)
male577
64.8%
female314
35.2%

Length

2021-10-25T15:58:41.880985image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-10-25T15:58:42.751193image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
ValueCountFrequency (%)
male577
64.8%
female314
35.2%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Age
Real number (ℝ≥0)

HIGH CORRELATION

Distinct88
Distinct (%)9.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean29.39749719
Minimum0.42
Maximum80
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size13.9 KiB
2021-10-25T15:58:43.139182image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0.42
5-th percentile6
Q122
median28
Q335
95-th percentile54
Maximum80
Range79.58
Interquartile range (IQR)13

Descriptive statistics

Standard deviation13.01882099
Coefficient of variation (CV)0.4428547404
Kurtosis0.9893613519
Mean29.39749719
Median Absolute Deviation (MAD)6
Skewness0.5019999752
Sum26193.17
Variance169.4896999
MonotonicityNot monotonic
2021-10-25T15:58:43.945795image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
28152
 
17.1%
2964
 
7.2%
2430
 
3.4%
2227
 
3.0%
1826
 
2.9%
1925
 
2.8%
3025
 
2.8%
2124
 
2.7%
2624
 
2.7%
2523
 
2.6%
Other values (78)471
52.9%
ValueCountFrequency (%)
0.421
 
0.1%
0.671
 
0.1%
0.752
 
0.2%
0.832
 
0.2%
0.921
 
0.1%
17
0.8%
210
1.1%
36
0.7%
410
1.1%
54
 
0.4%
ValueCountFrequency (%)
801
 
0.1%
741
 
0.1%
712
0.2%
70.51
 
0.1%
702
0.2%
661
 
0.1%
653
0.3%
642
0.2%
632
0.2%
624
0.4%

SibSp
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct7
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.5230078563
Minimum0
Maximum8
Zeros608
Zeros (%)68.2%
Negative0
Negative (%)0.0%
Memory size13.9 KiB
2021-10-25T15:58:44.541363image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile3
Maximum8
Range8
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.102743432
Coefficient of variation (CV)2.108464374
Kurtosis17.88041973
Mean0.5230078563
Median Absolute Deviation (MAD)0
Skewness3.695351727
Sum466
Variance1.216043077
MonotonicityNot monotonic
2021-10-25T15:58:45.088654image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0608
68.2%
1209
 
23.5%
228
 
3.1%
418
 
2.0%
316
 
1.8%
87
 
0.8%
55
 
0.6%
ValueCountFrequency (%)
0608
68.2%
1209
 
23.5%
228
 
3.1%
316
 
1.8%
418
 
2.0%
55
 
0.6%
87
 
0.8%
ValueCountFrequency (%)
87
 
0.8%
55
 
0.6%
418
 
2.0%
316
 
1.8%
228
 
3.1%
1209
 
23.5%
0608
68.2%

Parch
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct7
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.3815937149
Minimum0
Maximum6
Zeros678
Zeros (%)76.1%
Negative0
Negative (%)0.0%
Memory size13.9 KiB
2021-10-25T15:58:45.810935image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile2
Maximum6
Range6
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.8060572211
Coefficient of variation (CV)2.112344071
Kurtosis9.778125179
Mean0.3815937149
Median Absolute Deviation (MAD)0
Skewness2.749117047
Sum340
Variance0.6497282437
MonotonicityNot monotonic
2021-10-25T15:58:46.294882image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0678
76.1%
1118
 
13.2%
280
 
9.0%
55
 
0.6%
35
 
0.6%
44
 
0.4%
61
 
0.1%
ValueCountFrequency (%)
0678
76.1%
1118
 
13.2%
280
 
9.0%
35
 
0.6%
44
 
0.4%
55
 
0.6%
61
 
0.1%
ValueCountFrequency (%)
61
 
0.1%
55
 
0.6%
44
 
0.4%
35
 
0.6%
280
 
9.0%
1118
 
13.2%
0678
76.1%

Fare
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct248
Distinct (%)27.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean32.20420797
Minimum0
Maximum512.3292
Zeros15
Zeros (%)1.7%
Negative0
Negative (%)0.0%
Memory size13.9 KiB
2021-10-25T15:58:46.851243image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile7.225
Q17.9104
median14.4542
Q331
95-th percentile112.07915
Maximum512.3292
Range512.3292
Interquartile range (IQR)23.0896

Descriptive statistics

Standard deviation49.6934286
Coefficient of variation (CV)1.543072528
Kurtosis33.39814088
Mean32.20420797
Median Absolute Deviation (MAD)6.9042
Skewness4.78731652
Sum28693.9493
Variance2469.436846
MonotonicityNot monotonic
2021-10-25T15:58:47.346488image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
8.0543
 
4.8%
1342
 
4.7%
7.895838
 
4.3%
7.7534
 
3.8%
2631
 
3.5%
10.524
 
2.7%
7.92518
 
2.0%
7.77516
 
1.8%
26.5515
 
1.7%
015
 
1.7%
Other values (238)615
69.0%
ValueCountFrequency (%)
015
1.7%
4.01251
 
0.1%
51
 
0.1%
6.23751
 
0.1%
6.43751
 
0.1%
6.451
 
0.1%
6.49582
 
0.2%
6.752
 
0.2%
6.85831
 
0.1%
6.951
 
0.1%
ValueCountFrequency (%)
512.32923
0.3%
2634
0.4%
262.3752
0.2%
247.52082
0.2%
227.5254
0.4%
221.77921
 
0.1%
211.51
 
0.1%
211.33753
0.3%
164.86672
0.2%
153.46253
0.3%

Embarked
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)0.3%
Missing2
Missing (%)0.2%
Memory size57.4 KiB
S
644 
C
168 
Q
77 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowS
2nd rowC
3rd rowS
4th rowS
5th rowS

Common Values

ValueCountFrequency (%)
S644
72.3%
C168
 
18.9%
Q77
 
8.6%
(Missing)2
 
0.2%

Length

2021-10-25T15:58:47.705042image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-10-25T15:58:47.914481image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
ValueCountFrequency (%)
s644
72.4%
c168
 
18.9%
q77
 
8.7%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Entitlement
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct17
Distinct (%)1.9%
Missing0
Missing (%)0.0%
Memory size59.8 KiB
Mr.
517 
Miss.
182 
Mrs.
125 
Master.
 
40
Dr.
 
7
Other values (12)
 
20

Length

Max length9
Median length3
Mean length3.76318743
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique8 ?
Unique (%)0.9%

Sample

1st rowMr.
2nd rowMrs.
3rd rowMiss.
4th rowMrs.
5th rowMr.

Common Values

ValueCountFrequency (%)
Mr.517
58.0%
Miss.182
 
20.4%
Mrs.125
 
14.0%
Master.40
 
4.5%
Dr.7
 
0.8%
Rev.6
 
0.7%
Major.2
 
0.2%
Col.2
 
0.2%
Mlle.2
 
0.2%
Don.1
 
0.1%
Other values (7)7
 
0.8%

Length

2021-10-25T15:58:48.145862image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
mr517
58.0%
miss182
 
20.4%
mrs125
 
14.0%
master40
 
4.5%
dr7
 
0.8%
rev6
 
0.7%
col2
 
0.2%
major2
 
0.2%
mlle2
 
0.2%
jonkheer1
 
0.1%
Other values (7)7
 
0.8%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

FamilySize
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct8
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.882154882
Minimum1
Maximum9
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size13.9 KiB
2021-10-25T15:58:48.449052image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q32
95-th percentile6
Maximum9
Range8
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.488014386
Coefficient of variation (CV)0.7905908274
Kurtosis5.892240474
Mean1.882154882
Median Absolute Deviation (MAD)0
Skewness2.294513014
Sum1677
Variance2.214186812
MonotonicityNot monotonic
2021-10-25T15:58:48.804105image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=8)
ValueCountFrequency (%)
1534
59.9%
2166
 
18.6%
384
 
9.4%
456
 
6.3%
630
 
3.4%
99
 
1.0%
77
 
0.8%
55
 
0.6%
ValueCountFrequency (%)
1534
59.9%
2166
 
18.6%
384
 
9.4%
456
 
6.3%
55
 
0.6%
630
 
3.4%
77
 
0.8%
99
 
1.0%
ValueCountFrequency (%)
99
 
1.0%
77
 
0.8%
630
 
3.4%
55
 
0.6%
456
 
6.3%
384
 
9.4%
2166
 
18.6%
1534
59.9%

FamilyCategory
Categorical

HIGH CORRELATION

Distinct4
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size66.7 KiB
Single person
596 
Couple
161 
Relatives
83 
Couple and Children
 
51

Length

Max length19
Median length13
Mean length11.70594837
Min length6

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCouple
2nd rowCouple
3rd rowSingle person
4th rowCouple
5th rowRelatives

Common Values

ValueCountFrequency (%)
Single person596
66.9%
Couple161
 
18.1%
Relatives83
 
9.3%
Couple and Children51
 
5.7%

Length

2021-10-25T15:58:49.202682image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-10-25T15:58:49.497894image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
ValueCountFrequency (%)
person596
37.5%
single596
37.5%
couple212
 
13.3%
relatives83
 
5.2%
children51
 
3.2%
and51
 
3.2%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Interactions

2021-10-25T15:58:33.991061image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-10-25T15:58:22.120649image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-10-25T15:58:24.150889image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-10-25T15:58:26.519947image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-10-25T15:58:29.220446image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-10-25T15:58:31.475074image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-10-25T15:58:34.440573image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-10-25T15:58:22.334312image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-10-25T15:58:24.442135image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-10-25T15:58:26.826721image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-10-25T15:58:29.492522image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-10-25T15:58:31.962927image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-10-25T15:58:34.802627image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-10-25T15:58:22.817063image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-10-25T15:58:24.722021image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-10-25T15:58:27.460029image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-10-25T15:58:30.027456image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-10-25T15:58:32.354350image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-10-25T15:58:35.510665image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-10-25T15:58:23.161695image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-10-25T15:58:25.334249image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-10-25T15:58:27.943737image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-10-25T15:58:30.543165image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-10-25T15:58:32.702243image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-10-25T15:58:36.152311image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-10-25T15:58:23.552144image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-10-25T15:58:25.851910image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-10-25T15:58:28.526874image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-10-25T15:58:30.843561image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-10-25T15:58:33.026323image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-10-25T15:58:36.673922image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-10-25T15:58:23.797490image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-10-25T15:58:26.199645image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-10-25T15:58:28.844416image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-10-25T15:58:31.108548image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2021-10-25T15:58:33.368708image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Correlations

2021-10-25T15:58:49.978683image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-10-25T15:58:50.976007image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-10-25T15:58:51.593994image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-10-25T15:58:52.038556image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2021-10-25T15:58:52.397317image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2021-10-25T15:58:37.728489image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
A simple visualization of nullity by column.
2021-10-25T15:58:38.648664image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2021-10-25T15:58:38.870664image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

PassengerIdSurvivedPclassSexAgeSibSpParchFareEmbarkedEntitlementFamilySizeFamilyCategory
0103male22.0107.2500SMr.2Couple
1211female38.01071.2833CMrs.1Couple
2313female26.0007.9250SMiss.1Single person
3411female35.01053.1000SMrs.2Couple
4503male35.0008.0500SMr.2Relatives
5603male29.0008.4583QMr.3Relatives
6701male54.00051.8625SMr.1Single person
7803male2.03121.0750SMaster.4Single person
8913female27.00211.1333SMrs.6Couple and Children
91012female14.01030.0708CMrs.2Couple

Last rows

PassengerIdSurvivedPclassSexAgeSibSpParchFareEmbarkedEntitlementFamilySizeFamilyCategory
88188203male33.0007.8958SMr.1Single person
88288303female22.00010.5167SMiss.1Single person
88388402male28.00010.5000SMr.1Single person
88488503male25.0007.0500SMr.1Single person
88588603female39.00529.1250QMrs.5Couple and Children
88688702male27.00013.0000SRev.1Single person
88788811female19.00030.0000SMiss.3Relatives
88888903female28.01223.4500SMiss.2Single person
88989011male26.00030.0000CMr.1Single person
89089103male32.0007.7500QMr.1Single person